36 research outputs found

    Singing voice analysis/synthesis

    Get PDF
    Thesis (Ph. D.)--Massachusetts Institute of Technology, School of Architecture and Planning, Program in Media Arts and Sciences, 2003.Includes bibliographical references (p. 109-115).The singing voice is the oldest and most variable of musical instruments. By combining music, lyrics, and expression, the voice is able to affect us in ways that no other instrument can. As listeners, we are innately drawn to the sound of the human voice, and when present it is almost always the focal point of a musical piece. But the acoustic flexibility of the voice in intimating words, shaping phrases, and conveying emotion also makes it the most difficult instrument to model computationally. Moreover, while all voices are capable of producing the common sounds necessary for language understanding and communication, each voice possesses distinctive features independent of phonemes and words. These unique acoustic qualities are the result of a combination of innate physical factors and expressive characteristics of performance, reflecting an individual's vocal identity. A great deal of prior research has focused on speech recognition and speaker identification, but relatively little work has been performed specifically on singing. There are significant differences between speech and singing in terms of both production and perception. Traditional computational models of speech have focused on the intelligibility of language, often sacrificing sound quality for model simplicity. Such models, however, are detrimental to the goal of singing, which relies on acoustic authenticity for the non-linguistic communication of expression and emotion. These differences between speech and singing dictate that a different and specialized representation is needed to capture the sound quality and musicality most valued in singing.(cont.) This dissertation proposes an analysis/synthesis framework specifically for the singing voice that models the time-varying physical and expressive characteristics unique to an individual voice. The system operates by jointly estimating source-filter voice model parameters, representing vocal physiology, and modeling the dynamic behavior of these features over time to represent aspects of expression. This framework is demonstrated to be useful for several applications, such as singing voice coding, automatic singer identification, and voice transformation.by Youngmoo Edmund Kim.Ph.D

    Predicting Audio Advertisement Quality

    Full text link
    Online audio advertising is a particular form of advertising used abundantly in online music streaming services. In these platforms, which tend to host tens of thousands of unique audio advertisements (ads), providing high quality ads ensures a better user experience and results in longer user engagement. Therefore, the automatic assessment of these ads is an important step toward audio ads ranking and better audio ads creation. In this paper we propose one way to measure the quality of the audio ads using a proxy metric called Long Click Rate (LCR), which is defined by the amount of time a user engages with the follow-up display ad (that is shown while the audio ad is playing) divided by the impressions. We later focus on predicting the audio ad quality using only acoustic features such as harmony, rhythm, and timbre of the audio, extracted from the raw waveform. We discuss how the characteristics of the sound can be connected to concepts such as the clarity of the audio ad message, its trustworthiness, etc. Finally, we propose a new deep learning model for audio ad quality prediction, which outperforms the other discussed models trained on hand-crafted features. To the best of our knowledge, this is the first large-scale audio ad quality prediction study.Comment: WSDM '18 Proceedings of the Eleventh ACM International Conference on Web Search and Data Mining, 9 page

    Comparison of Brain Activation during Motor Imagery and Motor Movement Using fNIRS

    Get PDF
    Motor-activity-related mental tasks are widely adopted for brain-computer interfaces (BCIs) as they are a natural extension of movement intention, requiring no training to evoke brain activity. The ideal BCI aims to eliminate neuromuscular movement, making motor imagery tasks, or imagined actions with no muscle movement, good candidates. This study explores cortical activation differences between motor imagery and motor execution for both upper and lower limbs using functional near-infrared spectroscopy (fNIRS). Four simple finger- or toe-tapping tasks (left hand, right hand, left foot, and right foot) were performed with both motor imagery and motor execution and compared to resting state. Significant activation was found during all four motor imagery tasks, indicating that they can be detected via fNIRS. Motor execution produced higher activation levels, a faster response, and a different spatial distribution compared to motor imagery, which should be taken into account when designing an imagery-based BCI. When comparing left versus right, upper limb tasks are the most clearly distinguishable, particularly during motor execution. Left and right lower limb activation patterns were found to be highly similar during both imagery and execution, indicating that higher resolution imaging, advanced signal processing, or improved subject training may be required to reliably distinguish them

    Anodal tDCS to right dorsolateral prefrontal cortex facilitates performance for novice jazz improvisers but hinders experts

    Get PDF
    Research on creative cognition reveals a fundamental disagreement about the nature of creative thought, specifically, whether it is primarily based on automatic, associative (Type-1) or executive, controlled (Type-2) processes. We hypothesized that Type-1 and Type-2 processes make differential contributions to creative production that depend on domain expertise. We tested this hypothesis with jazz pianists whose expertise was indexed by the number of public performances given. Previous fMRI studies of musical improvisation have reported that domain expertise is characterized by deactivation of the right-dorsolateral prefrontal cortex (r-DLPFC), a brain area associated with Type-2 executive processing. We used anodal, cathodal, and sham transcranial direct-current stimulation (tDCS) applied over r-DLPFC with the reference electrode on the contralateral mastoid (1.5mA for 15 min., except for sham) to modulate the quality of the pianists’ performances while they improvised over chords with drum and bass accompaniment. Jazz experts rated each improvisation for creativity, aesthetic appeal, and technical proficiency. There was no main effect of anodal or cathodal stimulation on ratings compared to sham; however, a significant interaction between anodal tDCS and expertise emerged such that stimulation benefitted musicians with less experience but hindered those with more experience. We interpret these results as evidence for a dual-process model of creativity in which novices and experts differentially engage Type-1 and Type-2 processes during creative production

    Structured Encoding Of The Singing Voice Using Prior Knowledge Of The Musical Score

    No full text
    The human voice is the most difficul t musical instrument to simulate convincingly. Yet a great deal of progress has been made in voice coding, the parameterization and re-synthesis of a source signal according to an assumed voice model. Source-filter models of the human voice, particularly Linear Predictive Coding (LPC), are the basis of most low-bitrate (speech) coding techniques in use today. This paper introduces a technique for coding the singing voice using LPC and prior knowledge of the musical score to aid in the process of encoding, r educing the amount of data required to represent the voice. This approach advances the singing voice closer towards a structured audio model in which musical parameters such as pitch, duration, and phonemes are represented orthogonally to the synthesis technique and can thus be modified prior to re-synthesis. 1. INTRODUCTION A great deal of research (primarily dealing with speech) has focused on voice coding using an analysis/re-synthesis appr..

    Singer Identification in Popular Music Recordings Using Voice Coding Features

    No full text
    In most popular music, the vocals sung by the lead singer are the focal point of the song. The unique qualities of a singer's voice make it relatively easy for us to identify a song as belonging to that particular artist. With little training, if one is familiar with a particular singer's voice one can usually recognize that voice in other pieces, even when hearing a song for the first time. The research presented in this paper attempts to automatically establish the identity of a singer using acoustic features extracted from songs in a database of popular music. As a first step, an untrained algorithm for automatically extracting vocal segments from within songs is presented. Once these vocal segments are identified, they are presented to a singer identification system that has been trained on data taken from other songs by the same artists in the database

    Musical instrument identification: A pattern-recognition approach

    No full text
    A statistical pattern-recognition technique was applied to the classification of musical instrument tones within a taxonomic hierarchy. Perceptually salient acoustic features--- related to the physical properties of source excitation and resonance structure---were measured from the output of an auditory model (the log-lag correlogram) for 1023 isolated tones over the full pitch ranges of 15 orchestral instruments. The data set included examples from the string (bowed and plucked), woodwind (single, double, and air reed), and brass families. Using 70%/30% splits between training and test data, maximum a posteriori classifiers were constructed based on Gaussian models arrived at through Fisher multiplediscriminant analysis. The classifiers distinguished transient from continuant tones with approximately 99% correct performance. Instrument families were identified with approximately 90% performance, and individual instruments were identified with an overall success rate of appr..
    corecore